Release v0.10.0#853
Conversation
📝 WalkthroughWalkthroughThis PR adds per-sandbox persisted token-request secrets and a sandbox-keyed in-memory registry (register/verify by sandboxID), restores persisted secrets during boot when the host identity-token exists, injects generated secrets into container env and persists them on host, deletes persisted secret files on sandbox removal, updates token-server request handling to lookup sandboxID before verification, expands tests for registry/file persistence/recovery, and adds a v0.10.0 changelog entry dated 2026-06-09. Comment |
There was a problem hiding this comment.
Review: Release v0.10.0 Changelog Entry
I looked through this PR carefully. It's a focused, single-file change — 21 lines added, nothing deleted — that inserts the v0.10.0 release notes block into docs/docs/changelog.md directly above the existing v0.9.1 entry. That's exactly where it belongs.
What I checked
- Structure and placement: The new
## v0.10.0section is correctly inserted between the "Unreleased" block and## v0.9.1, following the same established format as every other release entry in this file (heading, italicised date, bold category sections, bullet items with PR links, trailing---separator). ✅ - Date: 2026-06-09, which is 5 days after v0.9.1 (2026-06-04). Chronologically consistent. ✅
- PR links: All referenced PRs (#831, #832, #834, #836, #838, #845, #846, #848, #850, #852) are linked back to the repo. Each entry cites the specific PR(s) responsible for the change. ✅
- No existing human reviews, no conflicts, not a draft, mergeable. ✅
- Content quality: The entries are detailed, user-facing, and clearly written. They cover the security model, migration path, and behaviour expectations — a high bar that matches the rest of this changelog. I especially appreciate the transparency around the workload identity feature (token location, env vars, algorithm choice, federation story) and the nuanced note on
logs -fdeduplication only engaging for interactive text output.
One minor observation
The logs -f collapses repeated lines bullet (line 26) contains an HTML <span> tag in the example text ([ Repeated Nx over <span> ]). This is almost certainly intentional — it looks like a placeholder for a dynamic duration string in the actual TUI output — and it renders fine in Markdown (the tag is escaped as \u003cspan\u003e in the raw file, which browsers will display as <span>). Nothing to fix, just worth being aware of if the docs site does any HTML sanitisation that might strip it.
Verdict
This is a clean, well-crafted changelog update that's ready to merge. Great work on making these release notes genuinely useful to readers — they tell the why and what changed for me, not just what was done.
…start A long-running sandbox authenticating via workload identity permanently lost token-server connectivity after the host's sandbox controller / token server restarted: the per-sandbox token server returned 403 "invalid token" for the sandbox's MIREN_IDENTITY_TOKEN_SECRET and could not self-heal without a restart (MIR-1235). The token server validates each request against an in-memory secret registry that was created empty on controller boot and only ever populated on the sandbox start path. Boot reconcile re-registered survivors with the token *refresher* but never with the secret registry, so every live sandbox's entry was lost on restart while the running process still held the original env secret forever. Fix (server-side only, no consumer change): - Persist each sandbox's token-request secret host-side (0600) on start, and reload + re-register it for surviving sandboxes during boot reconciliation, so the running process self-heals on the next controller restart. - Key the secret registry by sandbox identity instead of raw source IP. The caller's identity is resolved from the IP via the authoritative netdb lookup and the secret checked against that, closing IP-reuse edge cases. This also drops the IP plumbing from both register call sites. - Best-effort removal of the persisted secret on Delete so it doesn't linger if StopSandbox errors before its dir cleanup. - Only re-register survivors for token refresh when they actually have an identity-token file, so the refresh loop no longer spews write errors for sandboxes that predate workload identity.
Address CodeRabbit review on #854: - Trim trailing CR/LF in loadTokenSecret so a secret written by a text editor or fmt.Fprintln still matches the in-process env value. - Note in TestTokenServer_RecoversSecretAfterRestart why it writes to a plain t.TempDir() rather than the production sandboxPath.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
controllers/sandbox/sandbox.go (1)
2161-2175:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake secret persistence succeed before exporting the secret.
reconcileSandboxesOnBoot()can only restore token auth fromtoken-secreton disk. IfwriteTokenSecretfails here, the sandbox still starts withMIREN_IDENTITY_TOKEN_SECRETand works until the next controller restart, then falls back to the same permanent 403 state this PR is fixing. Fail creation here, or at least skip registering/exporting the secret unless the file write succeeds.Suggested fix
if c.tokenSecrets != nil && len(ep.Addresses) > 0 { secret, secretErr := generateTokenSecret() if secretErr != nil { c.Log.Warn("failed to generate token request secret", "sandbox", sb.ID, "error", secretErr) } else { - c.tokenSecrets.register(sb.ID.String(), secret) - envVars = append(envVars, fmt.Sprintf("MIREN_IDENTITY_TOKEN_SECRET=%s", secret)) - - // Persist the secret host-side so it can be re-registered after a - // controller/token-server restart. Without this the running sandbox's - // token requests 403 forever once the in-memory registry is lost. secretPath := c.sandboxPath(sb, tokenSecretFilename) if writeErr := writeTokenSecret(secretPath, secret); writeErr != nil { - c.Log.Warn("failed to persist token request secret", "sandbox", sb.ID, "error", writeErr) + return nil, fmt.Errorf("persisting token request secret: %w", writeErr) } + c.tokenSecrets.register(sb.ID.String(), secret) + envVars = append(envVars, fmt.Sprintf("MIREN_IDENTITY_TOKEN_SECRET=%s", secret)) } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controllers/sandbox/sandbox.go` around lines 2161 - 2175, Currently the code registers the token in c.tokenSecrets and appends MIREN_IDENTITY_TOKEN_SECRET to envVars before ensuring writeTokenSecret succeeded, which causes lost persistence on disk; change the flow in the block that handles generateTokenSecret()/secret (in reconcileSandboxesOnBoot-related creation code) so that you first call writeTokenSecret(secretPath, secret) and only if it returns nil do you call c.tokenSecrets.register(sb.ID.String(), secret) and append fmt.Sprintf("MIREN_IDENTITY_TOKEN_SECRET=%s", secret) to envVars; if writeTokenSecret fails, log the failure and either return an error to abort sandbox creation or skip registering/exporting the secret (do not register or add the env var unless the file write succeeded).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controllers/sandbox/sandbox.go`:
- Around line 2468-2475: The current code skips removing the persisted token
secret when c.tokenSecrets is nil; move the file-delete logic out of the
c.tokenSecrets != nil branch so the best-effort os.Remove of
filepath.Join(c.Tempdir, "containerd", id.PathSafe(), tokenSecretFilename)
always runs, but only call c.tokenSecrets.unregister(id.String()) when
c.tokenSecrets != nil; keep the existing error check (err != nil &&
!os.IsNotExist(err)) and c.Log.Warn("failed to remove persisted token secret",
"sandbox", id, "error", err) unchanged.
---
Outside diff comments:
In `@controllers/sandbox/sandbox.go`:
- Around line 2161-2175: Currently the code registers the token in
c.tokenSecrets and appends MIREN_IDENTITY_TOKEN_SECRET to envVars before
ensuring writeTokenSecret succeeded, which causes lost persistence on disk;
change the flow in the block that handles generateTokenSecret()/secret (in
reconcileSandboxesOnBoot-related creation code) so that you first call
writeTokenSecret(secretPath, secret) and only if it returns nil do you call
c.tokenSecrets.register(sb.ID.String(), secret) and append
fmt.Sprintf("MIREN_IDENTITY_TOKEN_SECRET=%s", secret) to envVars; if
writeTokenSecret fails, log the failure and either return an error to abort
sandbox creation or skip registering/exporting the secret (do not register or
add the env var unless the file write succeeded).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 306bff0d-3595-4413-9c4a-deddd152973c
📒 Files selected for processing (4)
controllers/sandbox/sandbox.gocontrollers/sandbox/sandbox_frozen_test.gocontrollers/sandbox/token_server.gocontrollers/sandbox/token_server_test.go
| if c.tokenSecrets != nil { | ||
| c.tokenSecrets.unregister(id.String()) | ||
| // Best-effort removal of the persisted secret. StopSandbox also wipes the whole | ||
| // sandbox dir, but removing the sensitive secret here ensures it doesn't linger | ||
| // if StopSandbox errors out before reaching its dir cleanup. | ||
| secretPath := filepath.Join(c.Tempdir, "containerd", id.PathSafe(), tokenSecretFilename) | ||
| if err := os.Remove(secretPath); err != nil && !os.IsNotExist(err) { | ||
| c.Log.Warn("failed to remove persisted token secret", "sandbox", id, "error", err) |
There was a problem hiding this comment.
Run the best-effort file delete even when tokenSecrets is nil.
WorkloadIssuer is optional on this controller, so it can still be cleaning up old sandboxes that already have persisted token-secret files while c.tokenSecrets is nil. In that configuration this branch is skipped, and any early StopSandbox failure leaves the secret on disk—the cleanup hole this change is meant to close.
Suggested fix
if c.tokenSecrets != nil {
c.tokenSecrets.unregister(id.String())
- // Best-effort removal of the persisted secret. StopSandbox also wipes the whole
- // sandbox dir, but removing the sensitive secret here ensures it doesn't linger
- // if StopSandbox errors out before reaching its dir cleanup.
- secretPath := filepath.Join(c.Tempdir, "containerd", id.PathSafe(), tokenSecretFilename)
- if err := os.Remove(secretPath); err != nil && !os.IsNotExist(err) {
- c.Log.Warn("failed to remove persisted token secret", "sandbox", id, "error", err)
- }
}
+ // Best-effort removal of the persisted secret. StopSandbox also wipes the whole
+ // sandbox dir, but removing the sensitive secret here ensures it doesn't linger
+ // if StopSandbox errors out before reaching its dir cleanup.
+ secretPath := filepath.Join(c.Tempdir, "containerd", id.PathSafe(), tokenSecretFilename)
+ if err := os.Remove(secretPath); err != nil && !os.IsNotExist(err) {
+ c.Log.Warn("failed to remove persisted token secret", "sandbox", id, "error", err)
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@controllers/sandbox/sandbox.go` around lines 2468 - 2475, The current code
skips removing the persisted token secret when c.tokenSecrets is nil; move the
file-delete logic out of the c.tokenSecrets != nil branch so the best-effort
os.Remove of filepath.Join(c.Tempdir, "containerd", id.PathSafe(),
tokenSecretFilename) always runs, but only call
c.tokenSecrets.unregister(id.String()) when c.tokenSecrets != nil; keep the
existing error check (err != nil && !os.IsNotExist(err)) and c.Log.Warn("failed
to remove persisted token secret", "sandbox", id, "error", err) unchanged.
Changelog update and version stamp for v0.10.0.
These are pure metadata changes (changelog markdown + version stamp), so CI can be skipped — the release workflow runs on the tag push.